Querying Both Time-aligned and Hierarchical Corpora with NXT Search

نویسندگان

  • Ulrich Heid
  • Holger Voormann
  • Jan-Torsten Milde
  • Ulrike Gut
  • Katrin Erk
  • Sebastian Padó
چکیده

One problem of the (re-)usability and exchange of annotated corpora is in the lack of standards in corpus formats and corpus query tools. This paper reports on the NXT Search tool, which was used to query two corpora with very different annotation formats. It is shown that with automatic data format conversion both corpora can be accessed and searched with NXT Search.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Querying Annotated Speech Corpora

This paper is concerned with querying annotated speech corpora. A growing number of such corpora is currently being created worldwide; however, their usefulness for a wider research community is restricted by the lack of standard tools for creating, editing, annotating, storing and querying them. Two solutions for these problems are presented here: the XML-based data format TASX for corpus crea...

متن کامل

Tools for hierarchical annotation of typed dialogue

We discuss a set of tools for annotating a complex hierarchical and linguistic structure of tutorial dialogue based on the NITE XML Toolkit (NXT) (Carletta et al., 2003). The NXT API supports multi-layered stand-off data annotation and synchronisation with timed and speech data. Using NXT, we built a set of extensible tools for detailed structure annotation of typed tutorial dialogue, collected...

متن کامل

Discontinuous Constituents: a Problematic Case for Parallel Corpora Annotation and Querying

In this paper, we discuss some linguistic phenomena that pose potential problems for multilevel linguistic annotation of parallel corpora in general and specifically for data encoding with state-of-art multilevel corpus querying tools such as CQP. We describe the strategy we use for integrating the standard hierarchical XML representation used to annotate such phenomena in our aligned bilingual...

متن کامل

Querying Multi-Layer Annotation and Alignment in Translation Corpora

When dealing with linguistically annotated and aligned corpora current research concentrates mainly on the investigation of translation properties. However, annotated and aligned corpora can be useful for practical translation as well, since translators also work with parallel corpora. Translators typically use raw sentence aligned corpora stored in translation memories. In this paper we will s...

متن کامل

The NITE XML Toolkit: Data Model and Query Language

The NITE XML Toolkit (NXT) is open source software for working with language corpora, with particular strengths for multimodal and heavily cross-annotated data sets. In NXT, annotations are described by types and attribute value pairs, and can relate to signal via start and end times, to representations of the external environment, and to each other via either an arbitrary graph structure or a ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004